Skip to content

Conversation

@TomAugspurger
Copy link
Contributor

Now that Store.getsize is a thing, we can do info_complete which includes the number of chunks written and the size of those bytes.

The current implementation unfortunately does two list_prefixes on the same prefix. The first to get the count of chunks initialized and the second to get the bytes stored under a prefix. Unfortunately, these down compose well. I haven't thought of a nice way to eliminate that yet. We can do this naively by doing a single list_prefix and then counting the number of keys as we call getsize on each. But we also have a Store.getsize_prefix for those stores that have fastpaths for getting the total number of bytes under a prefix. I don't really want a Store.getsize_and_count_prefix, but maybe some kind of Store.statistics_prefix? Probably not worth worrying about today.

Now that Store.getsize is a thing, we can do info_complete which
includes the number of chunks written and the size of those bytes.
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this Tom!

The double listing will definitely add some latency. But we can work on that in a future PR.

I could imagine an API like

nitems, total_size = store.statistics(prefix, ("count", "size"))

@jhamman jhamman added the V3 label Nov 29, 2024
@normanrz normanrz merged commit 206d145 into zarr-developers:main Nov 29, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants